{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "hdr01",
   "metadata": {},
   "source": [
    "# Distribute Election Votes\n",
    "\n",
    "Generates county-level and zip-code-level election vote files from raw\n",
    "precinct-level vote data.  Vehicle registration overlap between precincts\n",
    "and zip codes is used as a geographic proxy to distribute precinct votes\n",
    "across zip codes.\n",
    "\n",
    "Only zip codes present in both the ACS and vehicle-count pipelines (with\n",
    "non-zero population and non-zero vehicle count) are used as allocation\n",
    "targets.  This ensures the election output shares an identical geographic\n",
    "footprint with the other output files.\n",
    "\n",
    "**Inputs** (from `../Input/`):\n",
    "- `PrecinctVoteCounts.csv` — 2024 General Election votes per precinct\n",
    "- `2024AlRegVehicleCountsByCountyPrecinctZipCode.csv` — vehicle counts per (precinct, zip code) cell\n",
    "- `VehicleCountsByEVAFuelType_ALZipCodes.csv` — fuel-type vehicle counts per zip code\n",
    "\n",
    "**Inputs** (from `Process/`):\n",
    "- `ACS_ALZipCodes.csv` — ACS demographic data per zip code (for population)\n",
    "\n",
    "**Outputs** (to `Process/`):\n",
    "- `GeneralElectionVotes_ALCounties.csv`\n",
    "- `GeneralElectionVotes_ALZipCodes.csv`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr02",
   "metadata": {},
   "source": [
    "## 1. Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cfg01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vehicle file:      ..\\Input\\2024AlRegVehicleCountsByCountyPrecinctZipCode.csv\n",
      "Votes file:        ..\\Input\\PrecinctVoteCounts.csv\n",
      "Vehicle fuel file: ..\\Input\\VehicleCountsByEVAFuelType_ALZipCodes.csv\n",
      "ACS zips file:     ACS_ALZipCodes.csv\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "import polars as pl\n",
    "\n",
    "INPUT_DIR = Path(\"../Input\")\n",
    "\n",
    "# Input files\n",
    "VEHICLE_FILE = INPUT_DIR / \"2024AlRegVehicleCountsByCountyPrecinctZipCode.csv\"\n",
    "VOTES_FILE   = INPUT_DIR / \"PrecinctVoteCounts.csv\"\n",
    "VEH_FUEL_FILE = INPUT_DIR / \"VehicleCountsByEVAFuelType_ALZipCodes.csv\"\n",
    "ACS_ZIPS_FILE = Path(\"ACS_ALZipCodes.csv\")  # in Process/\n",
    "\n",
    "# Candidate columns in the votes file\n",
    "DEM_COL   = \"G24PREDHAR\"   # Harris (D)\n",
    "REP_COL   = \"G24PRERTRU\"   # Trump (R)\n",
    "OTHER_COLS = [\"G24PREIKEN\", \"G24PREIOLI\", \"G24PREISTE\", \"G24PREOWRI\"]\n",
    "VOTE_COLS  = [DEM_COL, REP_COL] + OTHER_COLS\n",
    "\n",
    "# Output column names (must match existing file format)\n",
    "OUT_DEM   = \"Dem. Popular Vote\"\n",
    "OUT_REP   = \"Rep. Popular Vote\"\n",
    "OUT_OTHER = \"All Other Popular Vote\"\n",
    "OUT_TOTAL = \"Total Popular Vote\"\n",
    "OUT_DEM_P = \"Dem. Proportion\"\n",
    "OUT_REP_P = \"Rep. Proportion\"\n",
    "\n",
    "print(f\"Vehicle file:      {VEHICLE_FILE}\")\n",
    "print(f\"Votes file:        {VOTES_FILE}\")\n",
    "print(f\"Vehicle fuel file: {VEH_FUEL_FILE}\")\n",
    "print(f\"ACS zips file:     {ACS_ZIPS_FILE}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr03",
   "metadata": {},
   "source": [
    "## 2. Load Input Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "load01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Applied Morgan COUNTY COURTHOUSE fix to NULL-precinct rows\n",
      "\n",
      "Vehicles: 4,666 rows, 7 cols\n",
      "  Unique precincts: 1,947\n",
      "  Unique zip codes: 759\n",
      "\n",
      "Votes: 1,947 rows, 10 cols\n",
      "  Unique precincts: 1,947\n"
     ]
    }
   ],
   "source": [
    "vehicles = pl.read_csv(VEHICLE_FILE)\n",
    "votes    = pl.read_csv(VOTES_FILE)\n",
    "\n",
    "# Manual fix: Morgan county has NULL-precinct rows in the vehicle file\n",
    "# that correspond to the \"COUNTY COURTHOUSE\" precinct in the vote file.\n",
    "morgan_null = (pl.col(\"County\") == \"Morgan\") & (pl.col(\"PrecinctID\") == \"NULL\")\n",
    "vehicles = vehicles.with_columns(\n",
    "    pl.when(morgan_null).then(pl.lit(\"Morgan-:-COUNTY COURTHOUSE\")).otherwise(pl.col(\"PrecinctID\")).alias(\"PrecinctID\"),\n",
    "    pl.when(morgan_null).then(pl.lit(\"Morgan\")).otherwise(pl.col(\"PrecinctCounty\")).alias(\"PrecinctCounty\"),\n",
    "    pl.when(morgan_null).then(pl.lit(\"COUNTY COURTHOUSE\")).otherwise(pl.col(\"Precinct\")).alias(\"Precinct\"),\n",
    "    pl.when(morgan_null).then(pl.lit(\"103\")).otherwise(pl.col(\"CountyFP\")).alias(\"CountyFP\"),\n",
    ")\n",
    "print(\"Applied Morgan COUNTY COURTHOUSE fix to NULL-precinct rows\")\n",
    "\n",
    "print(f\"\\nVehicles: {vehicles.shape[0]:,} rows, {vehicles.shape[1]} cols\")\n",
    "print(f\"  Unique precincts: {vehicles['PrecinctID'].n_unique():,}\")\n",
    "print(f\"  Unique zip codes: {vehicles['ZipCode'].n_unique():,}\")\n",
    "print()\n",
    "print(f\"Votes: {votes.shape[0]:,} rows, {votes.shape[1]} cols\")\n",
    "print(f\"  Unique precincts: {votes['UNIQUE_ID'].n_unique():,}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd7dxuoli0h",
   "metadata": {},
   "source": [
    "## 2b. Filter to ACS/Vehicle zip-code footprint\n",
    "\n",
    "Restrict the vehicle registration table to the zip codes that appear in\n",
    "both the ACS and vehicle-count pipelines with non-zero population and\n",
    "non-zero total vehicle count.  This ensures distribution weights are\n",
    "computed only over valid target zip codes — votes that would have gone\n",
    "to excluded zips are redistributed proportionally to the remaining zips\n",
    "within each precinct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "345604n5obm",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ACS zip codes (pop > 0):          638\n",
      "Vehicle fuel-type zips (count > 0): 757\n",
      "Intersection (valid targets):     635\n",
      "\n",
      "Vehicle registration rows: 4,666 -> 4,520 (dropped 146)\n",
      "Unique zip codes: 759 -> 635 (dropped 124)\n"
     ]
    }
   ],
   "source": [
    "# ACS zip codes with non-zero population\n",
    "acs_zips = pl.read_csv(ACS_ZIPS_FILE)\n",
    "acs_pop = acs_zips.select(\n",
    "    pl.col(\"Location\").cast(pl.Int64).alias(\"ZipCode\"),\n",
    "    pl.col(\"B01003_E001\").alias(\"Population\"),\n",
    ")\n",
    "acs_valid = set(acs_pop.filter(pl.col(\"Population\") > 0)[\"ZipCode\"].to_list())\n",
    "\n",
    "# Vehicle fuel-type zip codes with non-zero total count\n",
    "veh_fuel = pl.read_csv(VEH_FUEL_FILE)\n",
    "veh_totals = veh_fuel.group_by(\"ZipCode\").agg(pl.col(\"Count\").sum())\n",
    "veh_valid = set(veh_totals.filter(pl.col(\"Count\") > 0)[\"ZipCode\"].to_list())\n",
    "\n",
    "# Intersection = valid output zip codes\n",
    "valid_zips = acs_valid & veh_valid\n",
    "\n",
    "before_rows = vehicles.height\n",
    "before_zips = vehicles[\"ZipCode\"].n_unique()\n",
    "vehicles = vehicles.filter(pl.col(\"ZipCode\").is_in(valid_zips))\n",
    "after_rows = vehicles.height\n",
    "after_zips = vehicles[\"ZipCode\"].n_unique()\n",
    "\n",
    "print(f\"ACS zip codes (pop > 0):          {len(acs_valid)}\")\n",
    "print(f\"Vehicle fuel-type zips (count > 0): {len(veh_valid)}\")\n",
    "print(f\"Intersection (valid targets):     {len(valid_zips)}\")\n",
    "print(f\"\\nVehicle registration rows: {before_rows:,} -> {after_rows:,} \"\n",
    "      f\"(dropped {before_rows - after_rows:,})\")\n",
    "print(f\"Unique zip codes: {before_zips} -> {after_zips} \"\n",
    "      f\"(dropped {before_zips - after_zips})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr04",
   "metadata": {},
   "source": [
    "## 3. Data Quality — Precinct Match Report"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "qual01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Matched precincts:         1,946\n",
      "In vehicles only:          1\n",
      "In votes only:             1\n",
      "\n",
      "Votes in unmatched precincts: 0 / 2,265,090 (0.00%)\n",
      "  Unmatched: ['Coffee-:-Fort Rucker']\n"
     ]
    }
   ],
   "source": [
    "vehicle_precincts = set(vehicles[\"PrecinctID\"].unique().to_list())\n",
    "vote_precincts    = set(votes[\"UNIQUE_ID\"].unique().to_list())\n",
    "\n",
    "matched    = vehicle_precincts & vote_precincts\n",
    "in_veh_only = vehicle_precincts - vote_precincts\n",
    "in_vote_only = vote_precincts - vehicle_precincts\n",
    "\n",
    "print(f\"Matched precincts:         {len(matched):,}\")\n",
    "print(f\"In vehicles only:          {len(in_veh_only):,}\")\n",
    "print(f\"In votes only:             {len(in_vote_only):,}\")\n",
    "\n",
    "# Quantify lost votes from unmatched precincts\n",
    "if in_vote_only:\n",
    "    lost = votes.filter(pl.col(\"UNIQUE_ID\").is_in(list(in_vote_only)))\n",
    "    total_all = votes.select(pl.sum_horizontal(VOTE_COLS)).sum().item()\n",
    "    total_lost = lost.select(pl.sum_horizontal(VOTE_COLS)).sum().item()\n",
    "    print(f\"\\nVotes in unmatched precincts: {total_lost:,} / {total_all:,} \"\n",
    "          f\"({total_lost / total_all * 100:.2f}%)\")\n",
    "    if len(in_vote_only) <= 20:\n",
    "        print(f\"  Unmatched: {sorted(in_vote_only)}\")\n",
    "else:\n",
    "    print(\"\\nAll vote precincts matched — no lost votes.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr05",
   "metadata": {},
   "source": [
    "## 4. Distribution Weights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "wt01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Max weight-sum deviation from 1.0: 1.1102230246251565e-16\n",
      "Weighted table: 4,520 rows (precinct-zip pairs)\n"
     ]
    }
   ],
   "source": [
    "# Compute total vehicles per precinct\n",
    "precinct_totals = (\n",
    "    vehicles\n",
    "    .group_by(\"PrecinctID\")\n",
    "    .agg(pl.col(\"VehicleCount\").sum().alias(\"PrecinctTotal\"))\n",
    ")\n",
    "\n",
    "# Join totals back and compute weight\n",
    "weighted = (\n",
    "    vehicles\n",
    "    .select([\"PrecinctID\", \"ZipCode\", \"County\", \"VehicleCount\"])\n",
    "    .join(precinct_totals, on=\"PrecinctID\")\n",
    "    .with_columns(\n",
    "        (pl.col(\"VehicleCount\") / pl.col(\"PrecinctTotal\")).alias(\"Weight\")\n",
    "    )\n",
    ")\n",
    "\n",
    "# Verify weights sum to 1.0 per precinct\n",
    "weight_sums = (\n",
    "    weighted\n",
    "    .group_by(\"PrecinctID\")\n",
    "    .agg(pl.col(\"Weight\").sum().alias(\"WeightSum\"))\n",
    ")\n",
    "max_deviation = (weight_sums[\"WeightSum\"] - 1.0).abs().max()\n",
    "print(f\"Max weight-sum deviation from 1.0: {max_deviation}\")\n",
    "assert max_deviation < 1e-10, f\"Weight sums deviate from 1.0 by {max_deviation}\"\n",
    "\n",
    "print(f\"Weighted table: {weighted.shape[0]:,} rows (precinct-zip pairs)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr06",
   "metadata": {},
   "source": [
    "## 5. Distribute Votes to Zip Codes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "zip01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Zip-code vote table: 635 rows\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>G24PREDHAR</th><th>G24PRERTRU</th><th>G24PREIKEN</th><th>G24PREIOLI</th><th>G24PREISTE</th><th>G24PREOWRI</th></tr><tr><td>i64</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td></tr></thead><tbody><tr><td>35004</td><td>1389</td><td>4439</td><td>29</td><td>13</td><td>14</td><td>19</td></tr><tr><td>35005</td><td>1852</td><td>1334</td><td>11</td><td>1</td><td>8</td><td>4</td></tr><tr><td>35006</td><td>165</td><td>1458</td><td>3</td><td>1</td><td>1</td><td>4</td></tr><tr><td>35007</td><td>4221</td><td>8107</td><td>69</td><td>43</td><td>32</td><td>48</td></tr><tr><td>35010</td><td>2416</td><td>6480</td><td>37</td><td>12</td><td>6</td><td>20</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 7)\n",
       "┌─────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐\n",
       "│ ZipCode ┆ G24PREDHAR ┆ G24PRERTRU ┆ G24PREIKEN ┆ G24PREIOLI ┆ G24PREISTE ┆ G24PREOWRI │\n",
       "│ ---     ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        │\n",
       "│ i64     ┆ i64        ┆ i64        ┆ i64        ┆ i64        ┆ i64        ┆ i64        │\n",
       "╞═════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡\n",
       "│ 35004   ┆ 1389       ┆ 4439       ┆ 29         ┆ 13         ┆ 14         ┆ 19         │\n",
       "│ 35005   ┆ 1852       ┆ 1334       ┆ 11         ┆ 1          ┆ 8          ┆ 4          │\n",
       "│ 35006   ┆ 165        ┆ 1458       ┆ 3          ┆ 1          ┆ 1          ┆ 4          │\n",
       "│ 35007   ┆ 4221       ┆ 8107       ┆ 69         ┆ 43         ┆ 32         ┆ 48         │\n",
       "│ 35010   ┆ 2416       ┆ 6480       ┆ 37         ┆ 12         ┆ 6          ┆ 20         │\n",
       "└─────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filter votes to matched precincts only\n",
    "matched_votes = votes.filter(pl.col(\"UNIQUE_ID\").is_in(list(matched)))\n",
    "\n",
    "# Join weighted precinct-zip pairs with vote data\n",
    "distributed = (\n",
    "    weighted\n",
    "    .join(matched_votes, left_on=\"PrecinctID\", right_on=\"UNIQUE_ID\")\n",
    ")\n",
    "\n",
    "# Multiply each vote column by the weight\n",
    "for col in VOTE_COLS:\n",
    "    distributed = distributed.with_columns(\n",
    "        (pl.col(col) * pl.col(\"Weight\")).alias(col)\n",
    "    )\n",
    "\n",
    "# Aggregate by zip code and round to integer\n",
    "zip_votes = (\n",
    "    distributed\n",
    "    .group_by(\"ZipCode\")\n",
    "    .agg([pl.col(c).sum() for c in VOTE_COLS])\n",
    "    .with_columns([pl.col(c).round(0).cast(pl.Int64) for c in VOTE_COLS])\n",
    "    .sort(\"ZipCode\")\n",
    ")\n",
    "\n",
    "print(f\"Zip-code vote table: {zip_votes.shape[0]} rows\")\n",
    "zip_votes.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr07",
   "metadata": {},
   "source": [
    "## 6. Aggregate Votes by County"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "county01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "County vote table: 67 rows\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>County</th><th>G24PREDHAR</th><th>G24PRERTRU</th><th>G24PREIKEN</th><th>G24PREIOLI</th><th>G24PREISTE</th><th>G24PREOWRI</th></tr><tr><td>str</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td><td>i64</td></tr></thead><tbody><tr><td>&quot;Autauga&quot;</td><td>7439</td><td>20484</td><td>154</td><td>65</td><td>48</td><td>91</td></tr><tr><td>&quot;Baldwin&quot;</td><td>24934</td><td>95798</td><td>664</td><td>241</td><td>171</td><td>441</td></tr><tr><td>&quot;Barbour&quot;</td><td>4158</td><td>5606</td><td>41</td><td>14</td><td>13</td><td>23</td></tr><tr><td>&quot;Bibb&quot;</td><td>1619</td><td>7572</td><td>35</td><td>11</td><td>4</td><td>16</td></tr><tr><td>&quot;Blount&quot;</td><td>2576</td><td>25354</td><td>113</td><td>33</td><td>39</td><td>48</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 7)\n",
       "┌─────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐\n",
       "│ County  ┆ G24PREDHAR ┆ G24PRERTRU ┆ G24PREIKEN ┆ G24PREIOLI ┆ G24PREISTE ┆ G24PREOWRI │\n",
       "│ ---     ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        │\n",
       "│ str     ┆ i64        ┆ i64        ┆ i64        ┆ i64        ┆ i64        ┆ i64        │\n",
       "╞═════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡\n",
       "│ Autauga ┆ 7439       ┆ 20484      ┆ 154        ┆ 65         ┆ 48         ┆ 91         │\n",
       "│ Baldwin ┆ 24934      ┆ 95798      ┆ 664        ┆ 241        ┆ 171        ┆ 441        │\n",
       "│ Barbour ┆ 4158       ┆ 5606       ┆ 41         ┆ 14         ┆ 13         ┆ 23         │\n",
       "│ Bibb    ┆ 1619       ┆ 7572       ┆ 35         ┆ 11         ┆ 4          ┆ 16         │\n",
       "│ Blount  ┆ 2576       ┆ 25354      ┆ 113        ┆ 33         ┆ 39         ┆ 48         │\n",
       "└─────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Counties come directly from precinct votes — no distribution needed\n",
    "county_votes = (\n",
    "    matched_votes\n",
    "    .group_by(\"County\")\n",
    "    .agg([pl.col(c).sum() for c in VOTE_COLS])\n",
    "    .sort(\"County\")\n",
    ")\n",
    "\n",
    "print(f\"County vote table: {county_votes.shape[0]} rows\")\n",
    "county_votes.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr08",
   "metadata": {},
   "source": [
    "## 7. Format Output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "fmt01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Zip code output:\n",
      "shape: (3, 7)\n",
      "┌──────────┬──────────────┬──────────────┬────────────────────┬────────────────────┬─────────────────┬─────────────────┐\n",
      "│ Zip Code ┆ Dem. Popular ┆ Rep. Popular ┆ All Other Popular  ┆ Total Popular Vote ┆ Dem. Proportion ┆ Rep. Proportion │\n",
      "│ ---      ┆ Vote         ┆ Vote         ┆ Vote               ┆ ---                ┆ ---             ┆ ---             │\n",
      "│ i64      ┆ ---          ┆ ---          ┆ ---                ┆ str                ┆ f64             ┆ f64             │\n",
      "│          ┆ str          ┆ str          ┆ str                ┆                    ┆                 ┆                 │\n",
      "╞══════════╪══════════════╪══════════════╪════════════════════╪════════════════════╪═════════════════╪═════════════════╡\n",
      "│ 35004    ┆ 1,389        ┆ 4,439        ┆ 75                 ┆ 5,903              ┆ 0.235304        ┆ 0.751991        │\n",
      "│ 35005    ┆ 1,852        ┆ 1,334        ┆ 24                 ┆ 3,210              ┆ 0.576947        ┆ 0.415576        │\n",
      "│ 35006    ┆ 165          ┆ 1,458        ┆ 9                  ┆ 1,632              ┆ 0.101103        ┆ 0.893382        │\n",
      "└──────────┴──────────────┴──────────────┴────────────────────┴────────────────────┴─────────────────┴─────────────────┘\n",
      "\n",
      "County output:\n",
      "shape: (3, 7)\n",
      "┌─────────┬──────────────┬──────────────┬─────────────────────┬────────────────────┬─────────────────┬─────────────────┐\n",
      "│ County  ┆ Dem. Popular ┆ Rep. Popular ┆ All Other Popular   ┆ Total Popular Vote ┆ Dem. Proportion ┆ Rep. Proportion │\n",
      "│ ---     ┆ Vote         ┆ Vote         ┆ Vote                ┆ ---                ┆ ---             ┆ ---             │\n",
      "│ str     ┆ ---          ┆ ---          ┆ ---                 ┆ str                ┆ f64             ┆ f64             │\n",
      "│         ┆ str          ┆ str          ┆ str                 ┆                    ┆                 ┆                 │\n",
      "╞═════════╪══════════════╪══════════════╪═════════════════════╪════════════════════╪═════════════════╪═════════════════╡\n",
      "│ Autauga ┆ 7,439        ┆ 20,484       ┆ 358                 ┆ 28,281             ┆ 0.263039        ┆ 0.724303        │\n",
      "│ Baldwin ┆ 24,934       ┆ 95,798       ┆ 1,517               ┆ 122,249            ┆ 0.203961        ┆ 0.78363         │\n",
      "│ Barbour ┆ 4,158        ┆ 5,606        ┆ 91                  ┆ 9,855              ┆ 0.421918        ┆ 0.568848        │\n",
      "└─────────┴──────────────┴──────────────┴─────────────────────┴────────────────────┴─────────────────┴─────────────────┘\n"
     ]
    }
   ],
   "source": [
    "def format_election_output(df: pl.DataFrame, location_col: str, out_location_col: str) -> pl.DataFrame:\n",
    "    \"\"\"\n",
    "    Transform raw vote columns into the standard output format:\n",
    "    - Map candidate columns to standard names\n",
    "    - Compute All Other, Total, proportions\n",
    "    - Format vote counts with comma separators\n",
    "    \"\"\"\n",
    "    result = df.select(\n",
    "        pl.col(location_col).alias(out_location_col),\n",
    "        pl.col(DEM_COL).alias(\"_dem\"),\n",
    "        pl.col(REP_COL).alias(\"_rep\"),\n",
    "        pl.sum_horizontal(OTHER_COLS).alias(\"_other\"),\n",
    "    )\n",
    "\n",
    "    # Compute total and proportions\n",
    "    result = result.with_columns(\n",
    "        (pl.col(\"_dem\") + pl.col(\"_rep\") + pl.col(\"_other\")).alias(\"_total\")\n",
    "    ).with_columns([\n",
    "        (pl.col(\"_dem\") / pl.col(\"_total\")).alias(OUT_DEM_P),\n",
    "        (pl.col(\"_rep\") / pl.col(\"_total\")).alias(OUT_REP_P),\n",
    "    ])\n",
    "\n",
    "    # Format vote counts with comma separators\n",
    "    result = result.with_columns([\n",
    "        pl.col(\"_dem\").map_elements(lambda x: f\"{x:,}\", return_dtype=pl.Utf8).alias(OUT_DEM),\n",
    "        pl.col(\"_rep\").map_elements(lambda x: f\"{x:,}\", return_dtype=pl.Utf8).alias(OUT_REP),\n",
    "        pl.col(\"_other\").map_elements(lambda x: f\"{x:,}\", return_dtype=pl.Utf8).alias(OUT_OTHER),\n",
    "        pl.col(\"_total\").map_elements(lambda x: f\"{x:,}\", return_dtype=pl.Utf8).alias(OUT_TOTAL),\n",
    "    ])\n",
    "\n",
    "    # Select final columns in order\n",
    "    return result.select([\n",
    "        out_location_col,\n",
    "        OUT_DEM, OUT_REP, OUT_OTHER, OUT_TOTAL,\n",
    "        OUT_DEM_P, OUT_REP_P,\n",
    "    ])\n",
    "\n",
    "\n",
    "zip_output    = format_election_output(zip_votes, \"ZipCode\", \"Zip Code\")\n",
    "county_output = format_election_output(county_votes, \"County\", \"County\")\n",
    "\n",
    "print(\"Zip code output:\")\n",
    "print(zip_output.head(3))\n",
    "print(f\"\\nCounty output:\")\n",
    "print(county_output.head(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr09",
   "metadata": {},
   "source": [
    "## 8. Validation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "val01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "County totals match precinct sums: True\n",
      "\n",
      "Zip vs Precinct totals (rounding differences):\n",
      "  G24PREDHAR: precinct=772,412, zip=772,412, diff=0\n",
      "  G24PRERTRU: precinct=1,462,616, zip=1,462,607, diff=9\n",
      "  G24PREIKEN: precinct=12,075, zip=12,064, diff=11\n",
      "  G24PREIOLI: precinct=4,930, zip=4,921, diff=9\n",
      "  G24PREISTE: precinct=4,319, zip=4,316, diff=3\n",
      "  G24PREOWRI: precinct=8,738, zip=8,740, diff=2\n",
      "\n",
      "Overall: precinct=2,265,090, zip=2,265,060, diff=30\n",
      "Rounding error: 0.0013%\n"
     ]
    }
   ],
   "source": [
    "# --- County totals must match precinct sums exactly ---\n",
    "precinct_total = matched_votes.select([pl.col(c).sum() for c in VOTE_COLS])\n",
    "county_total   = county_votes.select([pl.col(c).sum() for c in VOTE_COLS])\n",
    "\n",
    "county_match = True\n",
    "for c in VOTE_COLS:\n",
    "    p = precinct_total[c].item()\n",
    "    ct = county_total[c].item()\n",
    "    if p != ct:\n",
    "        county_match = False\n",
    "        print(f\"  MISMATCH {c}: precinct={p}, county={ct}\")\n",
    "\n",
    "print(f\"County totals match precinct sums: {county_match}\")\n",
    "assert county_match, \"County aggregation does not match precinct sums!\"\n",
    "\n",
    "# --- Zip totals should approximately match (small rounding diff) ---\n",
    "zip_total = zip_votes.select([pl.col(c).sum() for c in VOTE_COLS])\n",
    "\n",
    "print(\"\\nZip vs Precinct totals (rounding differences):\")\n",
    "for c in VOTE_COLS:\n",
    "    p = precinct_total[c].item()\n",
    "    z = zip_total[c].item()\n",
    "    diff = abs(p - z)\n",
    "    print(f\"  {c}: precinct={p:,}, zip={z:,}, diff={diff}\")\n",
    "\n",
    "# Total across all vote columns\n",
    "total_precinct = sum(precinct_total[c].item() for c in VOTE_COLS)\n",
    "total_zip      = sum(zip_total[c].item() for c in VOTE_COLS)\n",
    "overall_diff   = abs(total_precinct - total_zip)\n",
    "print(f\"\\nOverall: precinct={total_precinct:,}, zip={total_zip:,}, diff={overall_diff}\")\n",
    "print(f\"Rounding error: {overall_diff / total_precinct * 100:.4f}%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr10",
   "metadata": {},
   "source": [
    "## 9. Save"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "save01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saved to Process/:\n",
      "  GeneralElectionVotes_ALCounties.csv  (67 rows)\n",
      "  GeneralElectionVotes_ALZipCodes.csv  (635 rows)\n"
     ]
    }
   ],
   "source": [
    "county_output.write_csv(\"GeneralElectionVotes_ALCounties.csv\")\n",
    "zip_output.write_csv(\"GeneralElectionVotes_ALZipCodes.csv\")\n",
    "\n",
    "print(\"Saved to Process/:\")\n",
    "print(f\"  GeneralElectionVotes_ALCounties.csv  ({county_output.height} rows)\")\n",
    "print(f\"  GeneralElectionVotes_ALZipCodes.csv  ({zip_output.height} rows)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hdr11",
   "metadata": {},
   "source": [
    "## 10. Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "sum01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "  Election Vote Distribution — Summary\n",
      "============================================================\n",
      "\n",
      "Input precincts (votes):    1,947\n",
      "Input precincts (vehicles): 1,947\n",
      "Matched precincts:          1,946\n",
      "Unmatched (votes only):     1\n",
      "\n",
      "Output counties:  67\n",
      "Output zip codes: 635\n",
      "\n",
      "County totals match exactly: True\n",
      "Zip rounding diff (all cols): 30 votes\n"
     ]
    }
   ],
   "source": [
    "print(\"=\" * 60)\n",
    "print(\"  Election Vote Distribution — Summary\")\n",
    "print(\"=\" * 60)\n",
    "print(f\"\\nInput precincts (votes):    {votes.shape[0]:,}\")\n",
    "print(f\"Input precincts (vehicles): {vehicles['PrecinctID'].n_unique():,}\")\n",
    "print(f\"Matched precincts:          {len(matched):,}\")\n",
    "print(f\"Unmatched (votes only):     {len(in_vote_only):,}\")\n",
    "print(f\"\\nOutput counties:  {county_output.height}\")\n",
    "print(f\"Output zip codes: {zip_output.height}\")\n",
    "print(f\"\\nCounty totals match exactly: {county_match}\")\n",
    "print(f\"Zip rounding diff (all cols): {overall_diff} votes\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
